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Abstract 

We present GLSL implementations of Perlin noise and Perlin simplex 
noise that run fast enough for practical consideration on current genera- 
tion GPU hardware. The key benefits are that the functions are purely 
computational, i.e. they use neither textures nor lookup tables, and that 
they are implemented in GLSL version 1.20, which means they are com- 
patible with all current GLSL-capable platforms, including OpenGL ES 
2.0 and WebGL 1.0. Their performance is on par with previously pre- 
sented GPU implementations of noise, they are very convenient to use, 
and they scale well with increasing parallelism in present and upcoming 
GPU architectures. 




Figure 1: 2D and 3D simplex noise (S2D, S3D) and 2D and 3D classic noise (C2D, 
C3D) on a sphere, and a swirling fire shader using several noise components. 



1 Introduction and background 

Perlin noise [TJ [3] is one of the most useful building blocks of procedural shading 
in software. The natural world is largely built on or from stochastic processes, 
and manipulation of noise allows a variety of natural materials and environments 
to be procedurally created with high flexibility, at minimal labor and at very 
modest computational costs. The introduction of Perlin Noise revolutionized 
the offline rendering of artificially-created worlds. 

Hardware shading has not yet adopted procedural methods to any significant 
extent, because of limited GPU performance and strong real time constraints. 



McEwan et al: EfRcient computational noise in GLSL 



2 



However, with the recent rapid increase in GPU parallelism and performance, 
texture memory bandwidth is often a bottleneck, and procedural patterns are 
becoming an attractive alternative and a complement to traditional image-based 
textures. 

Simplex noise [2] is a variation on classic Perlin noise, with the same general 
look but with a different computational structure. The benefits include a lower 
computational cost for high dimensional noise fields, a simple analytic derivative, 
and an absence of axis-aligned artifacts. Simplex noise is a gradient lattice noise 
just like classic Perlin noise and uses the same fundamental building blocks. 
Some examples of noise on a sphere are shown in Figure [T] 

This presentation assumes the reader is familiar with classic Perlin noise and 
Perlin simplex noise. A summary of both is presented in [S]. We will focus on 
how our approach differs from software implementations and from the previous 
GLSL implementations in jHIS]. 

2 Platform constraints 

GLSL 1.20 implementations usually do not allow dynamic access of arrays in 
fragment shaders, lack support for 3D textures and integer texture lookups, have 
no integer logic operations, and don't optimize conditional code well. Previous 
noise implementations rely on many of these features, which limits their use on 
these platforms. Integer table lookups implemented by awkward floating point 
texture lookups produces unnecessarily slow and complex code and consumes 
texture resources. Supporting code outside of the fragment shader is needed to 
generate these tables or textures, preventing a concise, encapsulated, reusable 
GLSL implementation independent of the application environment. Our solu- 
tions to these problems are: 

• Replace permutation tables with computed permutation polynomials. 

• Use computed points on a cross polytope surface to select gradients. 

• Replace conditionals for simplex selection with rank ordering. 

These concepts are explained below. The resulting noise functions are com- 
pletely self contained, with no references to external data and requiring only a 
few registers of temporary storage. 

3 Permutation polynomials 

Previously published noise implementations have used permutation tables or 
bit- twiddling hashes to generate pseudo-random gradient indices. Both of these 
approaches are unsuitable for our purposes, but there is another way. A per- 
mutation polynomial is a function that uniquely permutes a sequence of inte- 
gers under modulo arithmetic, in the same sense that a permutation lookup 
table is a function that uniquely permutes a sequence of indices. A more 
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thorough explanation of permutation polynomials can be found in the online 
supplementary material to this article. Here, we will only point out that 
useful permutations can be constructed using polynomials of the simple form 
{Ax^ + Bx) mod M . For example, The integers modulo-9 admit the permuta- 
tion polynomial (6x^ + x) mod 9 giving the permutation (0 1234567 8)i— >■ 
(0 7 8 3 1 2 6 4 5). 

The number of possible polynomial permutations is a small subset of all 
possible shufflings, but there are more than enough of them for our purposes. 
We need only one that creates a good shuffling of a few hundred numbers, and 
the particular one wc chose for our implementation is (34a;^ + x) mod 289. 

What is more troublesome is the often inadequate integer support in GLSL 
1.20 that effectively forces us to use single precision floats to represent integers. 
There are only 24 bits of precision to play with (counting the implicit leading 
1), and a floating point multiplication doesn't drop the highest bits on over- 
flow. Instead it loses precision by dropping the low bits that do not fit and 
adjusts the exponent. This would be fatal to a permutation algorithm, where 
the least significant bit is essential and must not be truncated in any operation. 
If the computation of our chosen polynomial is implemented in the straight- 
forward manner, truncation occurs when 34x^ + x > 2^^, or |a;| > 702 in the 
integer domain. If we instead observe that modulo-M arithmetic is congruent 
for modulo-M operation on any operand at any time, we can start by mapping 
X to X mod 289 and then compute the polynomial 3Ax^ + x without any risk for 
overflow. By this modification, truncation does not occur for any x that can be 
exactly represented as a single precision float, and the noise domain is instead 
limited by the remaining fractional part precision for the input coordinates. Any 
single precision implementation of Perlin noise, in hardware or software, shares 
this limitation. 



4 Gradients on A^^-cross-polytopes 

Lattice gradient noise associates pseudo-random gradients with each lattice 
point. Previous implementations have used pre-computed lookup tables or bit 
manipulations for this purpose. We use a more floating-point friendly way 
and make use of geometric relationships between generalized octahedrons in 
different numbers of dimensions to map evenly distributed points from an (A^- 
1) -dimensional cube onto the boundary of the A^-dimensional equivalent of an 
octahedron, an A^-cross polytope. For N ^ 2, points on a line segment are 
mapped to the perimeter of a rotated square, see Figure [2] For A^ = 3, points 
in a square map to an octahedron, see Figure [Sj and for A^ = 4, points in a cube 
are mapped to the boundary of a 4-D truncated cross polytope. Equation ([I]) 
presents the mappings for the 2-D, 3-D and 4-D cases. 
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2- D: 1-0 e [-2,2], y = 1 - |xo| (1) 

if y > then x ~ xq else x = xq ~ sign(a;o) 

3- D: xo,yo e [-1,1], z = 1 - |a;o| - |yo| 

if z > then x — xq, y = j/o 
else x^xq- sign(xo), y ^ Vo - sign(yo) 

4- D: XQ,yo,ZQ & [-1,1], w = l.b - [xo[ - [yo[ - [zo[ 

if w > then x — xq, y — yo, z — zq 

else x = xo- sign(xo), y ^ yo - sign(yo), z = zo - sign(zo) 



The mapping for the 4-D case doesn't cover the full polytope boundary 
- it truncates six of the eight corners slightly. However, the mapping covers 
enough of the boundary to yield a visually isotropic noise field, and it is a simple 
mapping. The 4-D mapping is difficult both to understand and to visualize, but 
it is explained in more detail in the supplementary material. 




Figure 2: Mapping from a 1-D line segment to the boundary of a 2-D diamond shape. 



Most implementations of Perlin noise use gradient vectors of equal length, 
but the longest and shortest vectors on the surface of an A^-dimensional cross 
polytope differ in length by a factor of a/TV- This does not cause any strong 
artifacts, because the generated pattern is irregular anyway, but for higher di- 
mensional noise the pattern becomes less isotropic if the vectors are not ex- 
plicitly normalized. Normalization needs only to be performed in an approx- 
imate fashion, so we use the linear part of a Taylor expansion for the inverse 
square root l/y^ in the neighborhood of r = 0.7. The built-in GLSL function 
inversesqrt is likely to be just as fast on most platforms. Normalization 
can even be skipped entirely for a slight performance gain. 
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Figure 3: Mapping from a 2-D square to the boundary of a 3-D octahedron. Blue 
points in the quadrant x > 0,y > where |a;| + < 1 map to the face x,y,z > 0, 
while red points where \x\ + \y\ > 1 map to the opposite face x,y,z < 0. 

5 Rank ordering 

Simplex noise uses a two step process to determine which simplex contains a 
point p. First, the N-simplex grid is transformed to an axis-aligned grid of 
A^-cubes, each containing A''! simplices. The determination of which cube con- 
tains p only requires computing the integer part of the transformed coordinates. 
Then, the coordinates relative to the origin of the cube are computed by inverse 
transforming the fractional part of the transformed coordinates, and a rank or- 
dering is used to determine which simplex contains x. Rank ordering is the 
first stage of the unusual but classic rank sorting algorithm, where the values 
are first ranked and then rearranged into their sorted order. Rank ordering 
can be performed efficiently by pair-wise comparisons of components of p. Two 
components can be ranked by a single comparison, three components by three 
comparisons and four components can be ranked by six comparisons. In GLSL, 
up to four comparisons can be performed in parallel using vector operations. 
The ranking can be determined in a reasonably straightforward manner from 
the results of these comparisons. The rank ordering approach was used in a 
roundabout way in the software 4D noise implementation of [6] and the GLSL 
implementation of [5] , later improved and generalized by contributions from Bill 
Licea-Kane at AMD (then ATI). The 3D noise of [3] and Perlin's original soft- 
ware implementation presented in 2 instead use a decision tree of conditionals. 
For details on the rank ordering algorithm used for 3-D and 4-D simplex noise, 
which generalizes to A'^-D, we refer to the supplementary material. 
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6 Performance and source code 

The performance of the presented algorithms is good, as presented in Table [l] 
With reasonably recent GPU hardware, 2-D noise runs at a speed of several 
billion samples per second. 3-D noise attains about half that speed, and 4-D 
noise is somewhat slower still, with a clear speed advantage for 3-D and 4-D 
simplex noise compared to classic noise. All variants are fast enough to be 
considered for practical use on current GPU hardware. 

Procedural texturing scales better than traditional texturing with massive 
amounts of parallel execution units, because it is not dependent on texture 
bandwidth. Looking at recent generations of GPUs, parallelism seems to in- 
crease more rapidly in GPUs than texture bandwidth. Also, embedded GPU 
architectures designed for OpenGL ES 2.x have limited texture resources and 
may benefit from procedural noise despite their relatively low performance. 

The full GLSL source code for 2D simplex noise is quite compact, as pre- 
sented in Table [3j For the gradient mapping, this particular implementation 
wraps the integer range {0 . . . 288} repeatedly to the range {0 . . . 40} by a 
modulo-41 operation. 41 has no common prime factors with 289, which im- 
proves the shuffling, and 41 is reasonably close to an even divisor of 289, which 
creates a good isotropic distribution for the gradients. 

Counting vector operations as a single operation, this code amounts to just 
six dot operations, three mod, two floor, one each of step, max, f ract and abs, 
seventeen multiplications and nineteen additions. The supplementary material 
contains source code for 2-D, 3-D and 4-D simplex noise, classic Perlin noise and 
a periodic version of classic noise with an explicitly specified arbitrary integer pe- 
riod, to match the popular and useful pnoise () function in RenderMan SL. The 
source code is licensed under the MIT license. Attribution is required where sub- 
stantial portions of the work is used, but there are no other limits on commercial 
use or modifications. Managed and tracked code and a cross-platform bench- 
mark and demo for Linux, MacOS X and Windows can be downloaded from 
the public git repository git(Sgithub.com:ashima/webgl-noise.git, reach- 
able also by: 

http : //www.github . com/ashima/webgl-noise 

7 Old versus new 

The described noise implementations are fundamentally different from previ- 
ous work, in that they use no lookup tables at all. The advantage is that 
they scale very well with massive parallelism and are not dependent on texture 
memory bandwidth. The lack of lookup tables makes them suitable for a VLSI 
hardware implementation in silicon, and they can be used in vertex shader en- 
vironments where texture lookup is not guaranteed to be available, as in the 
baseline OpenGL ES 2.0 and WebGL 1.0 profiles. 

In terms of performance, this purely computational noise is not quite as 
fast on current GPUs as the previous implementation by Gustavson [5^, which 
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Const 


Simplex noise 


Classic noise 


GPU 


color 


2D 


3D 


4D 


2D 


3D 


4D 


Nvidia 
















GF7600GS 


3,399 


162 


72 


39 


180 


43 


16 


GTX260 


8,438 


1,487 


784 


426 


1,477 


589 


255 


GTX480 


8,841 


3,584 


1,902 


1,149 


3,489 


1,508 


681 


GTX580 


13,863 


4,676 


2,415 


1,429 


4,675 


2,003 


906 


AMD 
















HD3650 


1,974 


370 


193 


117 


320 


147 


67 


HD4850 


9,416 


2,586 


1,320 


821 


2,142 


992 


457 


HD5870 


18,061 


4,980 


3,062 


2,006 


4,688 


2,211 


1,092 



Table 1: Performance benchmarks for selected GPUs, in Msamples per second 

made heavy use of 2-D texture lookups both for permutations and gradient 
generation. Most real time graphics of today is very texture intensive, and 
modern GPU architectures are designed to have a high texture bandwidth. 
However, it should be noted that noise is mostly just one component of a shader, 
and a computational noise algorithm can make good use of unutilised ALU 
resources in an otherwise texture intensive shader. Furthermore, we consider 
the simplicity that comes from independence of external data to be an advantage 
in itself. 

A side by side comparison of the new implementation against the previous 
implementation is presented in Table [2j The old implementation is roughly 
twice as fast as our purely computational version, although the gap appears to 
be closing with more recent GPU models with better computing power. It is 
worth noting that 4D classic noise needs 16 pseudo-random gradients, which 
requires 64 simple quadratic polynomial evaluations and 16 gradient mappings 
in our new implementation, and a total of 48 2-D texture lookups in the previous 
implementation. The fact that the old version is faster despite its very heavy 
use of texture lookups shows that current GPUs are very clearly designed for 
streamlining texture memory accesses. 

8 Supplementary material 

http : //www. itn. liu. se/~stegu/jgt2011/supplement .pdf 
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Const 


Simplex noise 


Classic noise 


GPU, version 


color 


2D 


3D 


4D 


2D 


3D 


4D 


Nvidia 
















GTX260 new 


8,438 


1,487 


784 


426 


1,477 
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1,815 
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HD3650 new 


1,974 


370 


193 


117 


320 


147 


67 


HD3650 old 




665 


413 


241 


871 


333 


139 


HD4850 new 


9,416 


2,586 
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2,142 


992 
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HD4850 old 
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/ / 2D simplex noise 
#ver s ion 120 
vec3 permute{vec3 x) { 

return mod ( ( { x * 3 4 . ) + 1 . 0) * x , 289.0); } 
vec3 taylorInvSqrt(vec3 r) { 

return 1.79284291400159 - 0.85373472095314 * r; } 
float snoise(vec2 P) { 

const vec2 C ^ vec2 ( . 2 1 1 3 2 4 8 6 5 40 5 1 8 7 1 3 4 , // ( 3 . - s qr t ( 3 . ) ) / 6 . ; 

0.36602 5403 784438597) ; // 0.5*(sqrt(3.0)-1.0); 

/ / First corner 

vcc2 i — floor(P + dot{P, C.yy) ); 
vec2 xO — P — i + dot(i, C . xx ) ; 
// Other corners 
vec2 il ; 

il.x ^ step( xO.y, xO . x ); // 1.0 if xO . x > xO. y , else 0.0 
il . y ^ 1.0 - il.x; 

// xl ^ xO - il + 1.0 >(= C.xx; x2 ^ xO - 1.0 + 2.0 * C . xx ; 
vcc4 xl2 — xO.xyxy + vec4( C.xx, C.xx * 2.0 — 1.0); 
xl2 . xy — — il ; 
// Permutations 

i — mod (i, 289.0); // Avoid truncation in polynomial evaluation 
vec3 p — permute( permute( i.y + vec3(0.0, il.y, 1.0 )) 

+ i.x -\- vec3(0.0, il.x, 1.0 )); 
// Circularly symmetric blending kernel 

vec3 m = max (0.5 — vec3(dot(x0,x0), dot(xl2.xy,xl2.xy), 
dot (xl2.zw,xl2.zw)), 0.0); 

m — in=i=m ; 
m — m*m ; 

/ / Gradients from 41 points on a line , mapped onto a diamond 
vec3 X ^ fract(p * (1.0 / 41.0)) * 2.0 - 1.0 ; 
vcc3 gy — abs(x) — 0.5 ; 

vec3 ox — floor (x -\- 0.5); // round(x) is a GLSL 1.30 feature 
vec3 gx — X — ox ; 

// Normalise gradients implicitly by scaling m 
m *— t ay 1 or InvS qrt ( gx*gx + gy*gy ) ; 
// Compute final noise value al P 
vec3 g ; 

g.x — gx.x *x0.x + gy .X * xO . y ; 

g.yz — gx.yz * xl2.xz + gy.yz * xl2.yw; 

// Scale output to span range [ — 1,1] 

// (scaling factor determined by experiments) 

return 130.0 * dot(m, g); 



Table 3: Complete, self-contained source code for 2D simplex noise. Code for 2D, 3D 
and 4D versions of classic and simplex noise is in the supplementary material and in 
the online repository. 



